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'nI" ' Abstract 

o. 

We show how to obtain the mixture connection in an infinite dimensional 

■ information manifold and prove that it is dual to the exponential connection 
I with respect to the Fisher information. We also define the a-connections and 

^ I' prove that they are convex mixtures of the extremal (±l)-connections. 

■ 1 Introduction 

The search for a fully-hedged infinite dimensional version of Information Geometry 
, - is an ongoing enterprise. The original ideas date back to the work of Amari ||], 

[ Efron and most notably Dawid @, |5|, in which it was suggested using the 

natural exponential structure of the space of probability measures to obtain a 
manifold structure. 

The first mathematically rigorous construction of such a manifold is, however, 
due to Pistone and Sempi using the theory of Orlicz spaces as their most 
important analytical tool. In their seminal paper, the notion of exponential con- 
vergence is presented and an atlas consisting of maps with values in Orlicz spaces 
is proposed to cover the set of all probability measures equivalent to a given one. 
They then prove that the induced topology obtained from this atlas is equivalent 
to the one with exponential convergence and use this fact to prove that the atlas 
satisfies the conditions necessary in order to give rise to a C°° Banach manifold. 
The Banach spaces used are the so called exponential Orlicz spaces, denoted by 
L*i. We review their construction in section 3, with the diflFerence that we use the 
Banach space M*i , the completion of the bounded random variables in the norm 
of L*^ . We also present a direct proof that the construction yields a C°° Banach 
manifold without using the concept of exponential convergence. 

* Supported by a grant from CAPES-Brazil. 
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Having defined the information manifold, the next step in the programme is to 
define the analogues of a metric and dual connections, ideas that play a leading 
role in parametric information geometry. A proposal for exponential and mixture, 
as well as for the intermediate a-connection, has been advocated by Gibilisco and 
Pistone Q. However, we argue that their elegant definition does not properly 
generalise the original ideas from the parametric case. Their connections each act 
on a different vector bundle instead of all acting on the tangent bundle as in the 
finite dimensional case. The duality observed between them does not involve any 
metric, while in parametric information geometry dual connections with respect 
to one metric can fail to be dual with respect to an arbitrarily different metric. 

We present in section 4 our proposal for the infinite dimensional exponential 
and mixture connections, together with the appropriate concept of duality, as well 
as the generalised metric that makes them dual to each other. In section 5 we 
also show that these definitions reduce to the familiar ones for finite dimensional 
submanifolds and that exponential and mixture families are geodesic for the ex- 
ponential and mixture connections respectively. 

We then move to the subject of a-connections in section 6, where we again 
rearrange the definitions of Q in order to have them all acting on the same bundle 
and with the desired relation between them, the exponential and the mixture 
connections still holding. 



2 Orlicz Spaces 

We present here the aspects of the theory of Orlicz spaces that will be relevant for 
the construction of the information manifold. Similarly oriented short introduc- 



tions to the subject can be found in |13|, 12, 0]. For more comprehensive accounts 



the reader is refered to the monographs |14] and 



Recall first that a Young function is a convex function $ : R M satisfying 

i. = ^{-x), X G R, 

ii. $(0) = 0, 

iii. lim ^(x) = +oo. 

XI— »oo 

Note that, in this generality, $ can vanish on an interval around the origin 
(as opposed to vanishing iff x = 0) and it can also happen that = +oo, for 
< xi < X, although it must be continuous where it is finite (due to convexity). In 
the absence of these annoyances, most of the theorems have stronger conclusions. 
This will be the case for the following three Young functions used in information 
geometry: 

$i(x) = coshx — 1, 

$2(2;) = el^l - |x| - 1, (1) 
^3{x) = (1 + |x|)log(l + |x|) - |x| (2) 
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(in the sequel, ^1,^2 and <I>3 will always refer to these particular functions, with 
other symbols being used to denote generic Young functions). Any Young function 
^ (including those with a jump to infinity) admits an integral representation 



^{x) = [ (/){t)dt, x>0, 
Jo 



where <j) : 1— > M is nondecreasing, left continuous, 0(0) = and (f){x) = +00 
for X > a if ^{x) = +00, x > a > 0. 

We define the complementary (conjugate) function to <I> as the Young function 
^ given by 

*(y) = / Ht)dt, y>0, 



Jo 

where ip is the generalised inverse of (j), that is 

V'(s) = inf{t : (t>{t) > s}, s> 0. 

One can verify that $2 and $3 are a complementary pair. 

Two Young functions ^'i and ^'2 are said to be equivalent if there exist real 
numbers < ci < C2 < 00 and xq >0 such that 

^'l(cix) < ^'2(x) < *1(C2X), X > Xq. 

For example, the functions <I>i and <I>2 are equivalent. 

There are several classifications of Young functions according to their growth 
properties. The only one we are going to need for the construction of the informa- 
tion manifold is the so called A2-class. A Young function <I> : M 1— > M"'" satisfies the 
A2-condition if 

^>(2x) < K^{x), x>xo>0, 

for some constant K > 0. Examples of functions in this class are ^>(x) = > 1 

and the function ^3. 

Now let (r2,S,/x) be a measure space. The theory of Orlicz spaces can be 
developed using a general measure fi. However, in several important theorems, to 
get necessary and sufficient conditions, instead of just sufficient ones, one needs to 
impose a couple of technical restrictions on the measure. In this paper, we are going 
to assume without further mention that all our measures have the finite subset 



property and are diffuse on a set of positive measure [14, p 46]. The reader must be 



aware that some of the results we are going to state do not hold if these conditions 



are not assumed and is refered to [14| for the full version of the theorems when 
unrestricted measures are considered. The finite subset condition only excludes 
pathological cases like ^(^4) = if j4 = and fJ-{A) = 00 otherwise. It is satisfied, 
for instance, by all a-finite measures. We also mention that the Lebesgue measure 
on the Borel cr-algebra of M" is diffuse on a set of positive measure, as are many 
other measures likely to appear in applications of information geometry. 
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The Orlicz class associated with a Young function $ is defined as 

L*(/i) = | / : 17 h-f M, measurable : J $(/) < ooj 

It is a convex set. However, it is a vector space if and only if the function $ satisfies 
the A2-condition. 

The Orlicz space associated with a Young function $ is defined as 

L^{lj) = < / : O R, measurable : / ^{af) < oo, for some a > 



It is easy to prove that this is a vector space and that it coincides with iff 
$ satisfies the A2-condition. Moreover, if we identify functions which differ only 
on sets of measure zero, then is a Banach space when furnished with the 
Luxembourg norm 

mif) = inf |a; > : ^ ^{^)d^^ > l| , 

or with the equivalent Orlicz norm 

11/11$ = sup I / \fg\dfi : g G L*(/x), / ^{g)d^^ < l| , 
KJn JQ ) 

where ^ is the complementary Young function to ^. 

If two Young functions are equivalent, the Banach spaces associated with them 
coincide as sets and have equivalent norms. For example, L*i(yu) = L^^{fi). 

A key ingredient in the analysis of Orlicz spaces is the generalised Holder 
inequality. If $ and are complementary Young functions, f E L^{ij,), g E L^{ij), 
then 

/ \fgW<2N^U)N^{9)- 

JQ 

It follows that L* C (L*)* for any pair of complementary Young functions, the 
inclusion being strict in general. 

Suppose now that the measure space is finite. Then it is clear that L°°(/x) C 
L*(//). Let denote the closure of in the L*-norm and define also 



JQ 



M* = <^ / G L* : / < oo, for ah A; > 



In general, we have that M* C -E* . In the next lemma, we collect for later use the 
results for the case of a continuous Young function vanishing only at the origin. 

We need the following definition first. Wc say that / G L*^(/i) has an absolutely 
continuous norm if N^{fxAn) ~^ each sequence of measurable sets An i 0. In 
terms of the Orlicz norm, this is equivalent to the statement that for every e > 0, 
there exists 6 > such that 

W/xaU = sup !^jjfg\dn : g e ifj.), J^'^{g)dii < l| < £ 
provided A G S and fi{A) < S. 
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Lemma 3 Suppose that ^(fi) < oo and let (<I>, ^) be a complementary pair of 
Young functions, <I> continuous, ^(x) =0 iff x = 0. Then: 

i. M* = S*. 

a. (M*)* = L*. 

Hi. f € M* iff f has an absolutely continuous norm. 

Furthermore, M* is separable iff {Q,Tj, fi) is separable. If, moreover, $ satisfies 
the ^2-condition, then = L*. 

As consequences of this lemma, we obtain (L*^)* = L*i and (M*i)* = L*^. 



3 The Pistone-Sempi Information Manifold 

Consider the set of all the ^-almost everywhere strictly positive probability 
densities relative to the measure that is, 

M = M{VL, S, /i) = {/ : i-^ M, measurable : / > a.e. and / fd^i = 1}. 

Jo. 

For each point p € A^, let L'^^{p) be the exponential Orlicz space over the 
measure space The measure pdfi inherits all the good properties as- 

sumed for /i (finite subset property and diffusiveness) in addition to being finite, 
so that all the statements from the last section hold for L^^[p). Instead of using 
the whole of L*^ (j?) as the model Banach space for the manifold to be constructed, 
we restrict ourselves to M*i [p) C L*^ {p) and take its closed subspace of p-centred 
random variables 

Bp = {it G {p) : / updi^i = 0} 
Jn 

as the coordinate Banach space. 

For definiteness, we choose to work with the Orlicz norm || • ||$^, although 
everything could be done with the equivalent Luxemburg norm N^-^ , and use the 
notation || • ||$i,p when it is necessary to specify the base point p. 

In probabilistic terms, the set M^^(p) has the characterisation given in the 
following lemma, whose proof is a simple adaptation of the one given in for 
the case of L*^ (p). 

Lemma 4 M*^ (p) coincides with the set of random variables for which the mo- 
ment generating function is finite for a// 1 G M. 

Proof: Uu€ M'^^{p), then 

J (^i{tu)pdii = J ^- — l^pdfi <oo, for alH > 0, 
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which impUes 



/ 



e^'^pdn < oo, for all t G M. 



Conversely, if mu{t) < oo for all t G M, then both f'^^ e^'^pdfi and e ^'^pdfi are 
finite, so ^i(tu)pdiJ, < oo for all t > 0, which means that u G M'^^{p). 

In particular, the moment generating functional Zp{u) = j^e^pdjjL (otherwise 
known as the partition function) is finite on the whole of M*i(p). 
Let Vp be the open unit ball of Bp and consider the map 



6p • 



M 



u 



Zp{u 



Denote by Up the image of Vp under Cp. We verify that is a bijection from 



Vp to Up, since 



Zp{u) Zp{u) 



P 



implies that (u — is a constant random variable. But since (u — v) € Bp, we 
must have u = v. Then let be the inverse of Cp on Up. One can check that 



Cp ^ :U} 



Bp 

log 



pdfi. 



and also that, for any p\,P2 ^ M., 

iUp,nUp^) ep^{Up^r\Up^) 

u i-> n + log I — I - / I « + log - j p2dn. 

\P2j Jn V Pj 

Now suppose that q G Up^ fl ^r some pi,P2 ^ M.. Then we can write it as 



Zp{u 



-Pi, 



for some u G Vp^. Using the formula just obtained, we find 



ep/(g) =u + \og j ~ j^i^^ p) 



Since e^^^ (g) G , we have that 



ep2^(9)lki,P2 = 



n + log 



(-)-/ 

\P2j Jn 



« + log - ) P2dfJ, 



< 1. 
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Consider an open ball of radius r around u = (g) G iJAp^ fl Up^ ) in the 
topology of Bp^, that is, consider the set 

Ar = {ve Bp^ : \\v - u||$i,pi < r) 

and let r be small enough so that G Vp^. Then the image in M. of each point 



V & Ar under Cp^ is 



Zp^{v) 



Pi- 



We claim that q € Up^ fl lAp^ . Indeed, applying Cp^ to it we find 



(^) = ^ + log 



+ log - p2d^, 



so 



$i,P2 — 11^ ^ll'I'i,P2 + 



U + log 



U + log - P2C?/U 



"3?i,P2 



+ 



*1,P2 



< 



« - '"ll'i'i,P2 + l|ep2^(9)ll-i>i,P2 + / \v - u\p2dfi\\l 



l'I'l,P2 



\v — 



P2 



+ \\epHQ)hi 



P2 



+ \\v - uWi^p^K, 



where K = ||l||$i,p2 and we use the notation || • ||i^p2 the L^{p2)-noicm. It 
follows from the growth properties of $1 that there exists ci > such that 
||/||i,P2 — ci ||/||<i>i,2- Moreover, it was found in |13, 12] that L^^{pi) = L^^{p2), 
so there exists a constant C2 > such that ||/||$i,p2 ^ C2||/||$i,pi. Therefore, the 
previous inequality becomes 



^P2 



*i,P2 ^ C2|!v - u||<i>^,pi + ||e 2^ (g)||$i,p2 + 0102^^11?; - n||$^ 



,pi 



C2(l + ciK)\\v - n||$i,pi + ||e, 



P2 



'I>1,P2- 



Thus, is we choose 



we will have that 



r < 



-'P2 



(9)11*1 



P2 



C2{l+CiK) 



l|ep2^(9')II-fi,P2 < 1 

which proves the claim. What we just have proved is that e'^iplp^ ^Up^) consists 
entirely of interior points in the topology of Bp-^, so e~^{Up-^ nZ^/pj) is open in Bp-^. 

We then have that the collection {ilAp, ep^),p G M} satisfies the three axioms 
for being a C°°-atlas for M (see |lO| , p 20]). Moreover, since all the spaces Bp are 
toplinear isomorphic, we can say that is a C°°"manifold modelled on Bp. 

As usual, the tangent space at each point p ^ ^A can be abstractly identified 
with Bp. A concrete realisation has been given in jl2, proposition 21], namely each 
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tu 

curve through p € is tangent to a one-dimensional exponential model -^jp^P-, 
so we take u as the tangent vector representing the equivalence class of such a 
curve. 

Since we are using M^^ instead of L*^ to construct the manifold, we need 
the following corresponding definition for the maximal exponential model at each 
M: 

One can verify that £{p) is the connected component of M containing p. 

4 The Fisher Information and Dual Connections 

In the parametric version of information geometry, Amari and Nagaoka have in- 
troduced the concept of dual connections with respect to a Riemannian metric. 
For finite dimensional manifolds, any continuous assignment of a positive definite 
symmetric bilinear form to each tangent space determines a Riemannian metric. 
In infinite dimensions, we need to impose that the tangent space is self-dual and 
that the bilinear form is continuous. Since our tangent spaces Bp are not even 
reflexive, let alone self-dual, we abandon the idea of having a Riemannian struc- 
ture on M. and propose a weaker version of duality, the duality with respect to 
a continuous scalar product. When restricted to finite dimensional submanifolds, 
the scalar product becomes a Riemannian metric and the original definition of 
duality is recovered. 

Let (•, ■)p be a continuous positive definite symmetric bilinear form assigned 
continuously to each Bp ~ TpA4. A pair of connection (V, V*) are said to be dual 
with respect to (•, ■)p if 

{TU,T*v)g = {U,v)p (5) 

for all u,v G TpM. and all smooth curves 7 : [0, 1] — > such that 7(0) = p,7(l) = 
(7, where r and r* denote the parallel transports associated with V and V*, re- 
spectively. Equivalently, (V, V*) are dual with respect to (•, ■)p if 

V {{si,S2)p) = (V^,Sl, S2)p + (si, V*S2)p (6) 

for all V G TpM and all smooth vector fields si and 52- 

We stress that this is not the kind of duality obtained when a conection V 
on a bundle J- is used to construct another connection V' on the dual bundle J-* 
as defined, for instance, in |^, definiton 6]. The latter is a construction that does 
not involve any metric or scalar product and the two connections act on different 
bundles, while Amari's duality is a duality with respect to a specific scalar product 
(or metric, in the finite dimensional case) and the dual connections act on the same 
bundle, the tangent bundle. 

The infinite dimensional generalisation of the Fisher information is given by 

{u,v)p= / {uv)pdfi, yu,v G Bp. 
Jn 
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This is clearly bilinear, symmetric and positive definite. Also, since L*^ C L^'^ , 
the generalised Holder inequality gives 

\{u,v)p\ < K\\u\\^^^p\\v\\^^^p, yu,v € Bp, 

which implies the continuity of (•, 

The use of exponential Orlicz space to model the manifold induces naturally a 
globally flat affine connection on the tangent bundle TAi, called the exponential 
connection and denoted by V^^^. It is defined on each connected component of 
the manifold A4, which is equivalent to saying that its parallel transport is defined 
between points connected by an exponential model p3| , theorem 4.1]. If p and q 
are two such points, then L*i (p) = (q) and the exponential parallel transport 
is given by 

r^J) : TpM ^ TgM 



u ^ ~ uqdfi. 

Jn 

It is well defined, since TpM. = Bp and Tg^A = Bq are subsets of the same set 
M*i(p) = Af*^(g), so the exponential parallel transport just subtracts a constant 
from u to make it centred around the right point. 

We now want to define the dual connection to V^^-* with respect to the Fisher 
information. We begin by proving the following lemma. 

Lemma 7 Let p and q be two points in the same connected component of M. 
Then |m G Bq, for all u G Bp. 

Proof: From the hypothesis, u has absolutely continuous norm in L^^{p), so for 
for every e > 0, there exists 5 > such that A G E and ^{A) < 6 implies 

IKXvi||$i,p = sup jy \uv\pdix : V e L^^ [p] , j <I>3(u)pd^ < l| < e. (8) 

But since M*i(p) = M*i(g), as they are the completion of the same set L°° 
under equivalent norms (recall that p and q are supposed to be connected by an 
exponential model), we have that = (M*i(p))* = (M*i(g))* = -L*3(g), in 

the sense that they are the same set furnished with equivalent norms. We then 
use (^) to conclude that 



e > sup < / \uv 



\pdn -.v £ L'^'^ip), j '^3{v)pdfi < 1^ 
sup|y \uv\pdfi : V G L^^ (p), N^:^^p{v) < l| 
— sup < / \-u{kv)\qdiJ, : v G L^'^ (p), N^^^p{v) < 1 
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> ^snp yj^u{v')\qdfi : v' e L'''{q),N^,,g{v') < 1 



P 

-UXA 



where we used the facts that <^^{v)pdii < 1 iff N^^^p < 1 |T^, theorem 3.2.3] and 
that there exists a constant k such that -/V$g^p(-) < kN^,_^^q{-). Since e was arbitrary, 
this proves that |m has absolutely continuous norm in L^^{q). The lemma then 
follows from lemma ^ and the fact that is centred around q. 

We can then define the mixture connection on TAi, as 

P 

U I— > -u, 

q 

for p and q in the same connected component of M . We notice that it is also 
globally flat and prove the following result. 

Theorem 9 The connections V*-^-* and V^^^^ are dual with respect to the Fisher 
information. 

Proof: We have that 

-(1)7/ T^^^'^ n\ ^ = / 11 — / iinrlii — 



{t^ 'u,T^ 'v)q = in — I uqdfj,,-v 

In 1 / q 



u{p/q)vqdfj, — | / uqdfi \ / {p/q)vqdfj, 
n \Jn J Jn 



n 



uvpd^ 
{u,v)p, \lu,v G Bp, 



where, to go from the second to the third line above, we used that v is centred 
around p. 

5 Covariant Derivatives and Geodesies 

We begin this section recalling that the covariant derivative for the exponential 
connection has been computed in proposition 25] and found to be 

(y^)s) {p) = {d,s){p) - Ep {{d,s){p)) , (10) 

where s G S{TM.) is a differentiable vector field, v G TpM. is a tangent vector 
at p, Ep{-) denotes the expected value with respect to the measure pd^j, and d^s 
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denotes the directional derivative in M*i of s composed with some patch Cp as a 
map between Banach spaces. 

We first notice that this gives the usual covariant derivative for the exponential 
connection in parametric information geometry pp 117-118]. For if {d^, . . . , 0""} 
is a coordinate system in a finite dimensional submanifold of Ai we can put v = 
and s = ^ggf ^ as the 1-representation (see [Q, p 41]) of the vector field then 
(|To|) reduces to 

S9« 



which is the classical finite dimensional result. Accordingly, parametric exponential 
models are flat submanifolds of A4 . 

We can also verify that one-dimensional exponential models of the form 



are geodesies for V^^-*, since if s{t) = ^ ^log is the vector field tangent to q{t) 
at each point t proposition 21], then (IC) gives 

= 0. 

As we emphasised in the previous section, the definition given in |^, definition 
22] for the mixture connection differs from ours (due to the different concepts of 
duality employed), so we have to compute its covariant derivative according to the 
definition given here, at least to have the notation right. 

Proposition 11 Let 7 : (— e,e) A4 be a smooth curve such that p = 7(0) and 
V = (Cp^ O7)'(0) and let s £ S{TA4) be a differentiable vector field. Then 

(Vi-'\s) {p) = {d,s){p) + s{p)i'iO), (12) 

where £{t) = log(7(t)). 
Proof: 

(vi-.>,)(rt = (v;:i'„^,,,„,») (7(0)) 



lim — 

h^O h 



1 r (-1) 
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7(0) 



lim — 

h^O h 

lim i [s(7(/i)) 



s{^{h)) - s(7(0)) 



s(7(0))] + lim - 



l{h) - 7(0) 
7(0) 



s{l{h)) 



{d,s){p) + s{p)e{Q). 



Again this reduces to the parametric result for the case of submanifolds of M. 
Put = and s = in (0) to obtain 

(-1) d \ ^ ^ d^logp ^ dlogp d\ogp 



which is the classical finite dimensional result. 

The mixture connection owes its name to the fact that in the parametric version 
of information geometry a convex mixture of two densities describes a geodesic with 
respect to V^~^\ To verify the same statement in the non-parametric case, we first 
need to check that a convex mixture of two points in a connected component of 
M remains in the same connected component. 

Proposition 13 If qi and q2 are two points in S{p) for some p G A4, then 

q(t) = tqi + (1 - t)q2 
belongs to £{p) for all t € (0, 1). 
Proof: We begin by writing 

11 = I . V and 52 = — rP, 
Zp(ni) ^p(u2) 

for some ui,U2 £ Bp C M^^{p). To simplify the notation, let us define 

tti = til - log Zp{ui) and ^2 = ^2 - log Zp{u2). 

We want to show that, if we write 

e^p = q{t) = te^^p + (1 - t)e^^p, 

then u is an element of M^^{p), so that 

u = u — updfi G Bp 
Jn 

and 

Zp[u) 
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All we need to prove is that both e^'^pdfi and e ''^pd/j, are finite for all 
k > 0. We have that 



which implies 



te"i + (1 - t)e"2 
< 2^= ft^e^^i + (1 - t)'=e'=^2 



Thus 



Jn 



since both tti and U2 are in M*i(p). As for the other integral, observe that 



-ku 



(te"i + (1 -t)e"2)" 



1 



< 



(te"i + (1 - t)e^2)'' 
1 



Therefore 



/ e-^ydii < I ne~''^'pdfi < oo, 
Jn 



since ui G M^i{p). 

We can now verify that a family of the form 

q{t) = tqi + {I - t)q2, t G (0, 1) 

is a geodesic for V^~^\ Let s{t) = ^ ^log be the vector field tangent to q{t) 
at each point t, then (12) gives 



{e-'oq)'{t) 



d^ 



log 



tqi + {l-t)q2 



+ 



log 



d_ 

dt [tqi + (1 - t)q2 p 
p 



tqi + (1 - t)q2 
p 

{qi - 92) 



[log% + (l-t)g2] 



+ 



tqi + (1 - t)q2 p y % + (1 - t)Q'2 



jqi - 92) 

% + (1 - t)q2 



+ 



{qi - q2) 

tqi + (1 - t)q2 



0. 
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6 a-connections 



In this section, we define the infinite dimensional analogue of the a-connections 
introduced in the parametric case independently by Chentsov Q and Amari j^]. 
We use the same technique proposed by Gibilisco and Pistone |8|, namely exploring 
the geometry of spheres in the Lebesgue spaces U , but modified in such a way 
that the resulting connections all act on the tangent bundle TA4. 
We begin with Amari's a-embeddings 



: M 

P 



2 l-a 
P 2 ., 

1 — a 



a G (-1,1), 



where r = -r-^. 

l—a 

Observe that 

\\Up)\[ 



laipYdfl 



2 l-g 
P 2 

— a 



l/r 



SO ia{p) £ 'S""(/^)) the sphere of radius r in L'^{^) (we warn the reader that through- 
out this paper, the r in refers to the fact that this is a sphere of radius r, while 
the fact that it is a subset of U is judiciously omitted from the notation). 

According to Gibilisco and Pistone [Q, the tangent space to S'^{fi) at a point 

/is 

where /* = sgn(/)|/I''^^. In our case, 

/ = Up) = rp'/' 



so that 



l/r 



r-1 



/* = [rp 

Therefore, the tangent space to S'^{fi) at rp^/*" is 



r-l 1-1/r 
r p ' . 



We now look for a concrete realisation of the push-forward of the map £a when 
the tangent space TpM. is identified with Bp as in the previous sections. Since 



d 



2 l-a 
-p 2 



1-n d log p 
P 2 



dt \1 — a J ^ dt 
the push-forward can be formally implemented as 

(iaUp) : TpM = Bp ^ V/.5'^(/i) 
U I— > p 2 u. 
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For this to be well defined, we need to check that p 2 u is an element of X^pi/r 5^ (/x) . 
Indeed, since L*i(p) C L'{p) for all s > 1, we have that 

/ (p^^^u) dfi = u'^pdfi < 00, 
Jn^ ' Jn 

so p 2 u E U(pL). Moreover 

p^^^up^^^^'^d/j, = / upd/j, = 0, 



which verifies that p^^'^u € T^pi/rS^{fi). 

The sphere 5'^(/i) inherits a natural connection obtained by projecting the 
trivial connection on L'^{n) (the one where parallel transport is just the identity 
map) onto its tangent space at each point. For each / € 5"'(//), a canonical 
projection from the tangent space TfL^{fi) onto the tangent space TfS^{fi) can be 
uniquely defined, since the spaces L^ifi) are uniformly convex 0, and is given by 

g ^ 9 - (^r^'' gfdfj^ f. 

When / = rp^/*" and /* = ^-''-ipi^i/''^ the formula above gives 

9 ^ 9-( f 9p'-'^''dfi] p^l'. 



We are now ready to define the a-connections. In what follows, V is used to 
denote the trivial connection on U(^[i). 

Definition 14 For a € (—1, 1), let 7 : {—e,e) A4 be a smooth curve such that 
p = 7(0) and V = 7(0) and let s S S{TM.) he a differentiable vector field. The 
a- connection on TM. is given by 

(v^) (p) = iea);lp^ fn,pi/,.V(,^),^^j,(£«),(^(t))s] . (is) 



A formula like ( [l5| ) deserves a more wordy explanation. We take the vector 
field s and push it forward along the curve 7 to obtain {i^a)*{'y{t))S- Then we take 
its covariant derivative with respect to the trival connection V in the direction 
of (£ a)*(p)V, the push- forward of the tangent vector v. The result is a vector in 
Tj.pl/rV (n), so we use the canonical projection 11 to obtain a vector in T^.^i/rS'^ (ji). 
Finally, we pull it back to TpA4 using (^a)"/-,- 
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The next theorem shows that the relation between the exponential, the mixture 
and the a-connections just defined is the same as in the parametric case. Its proof 
resembles the calculation in the last pages of , except that all our connections act 
on the same bundle, whereas in Q each one is defined on its own bundle-connection 
pair. 



Theorem 16 The exponential, mixture and a-connections on TA4 satisfy 

1 + a (1) 1-a 
2 2 



(17) 



Proof: Let i{t) = log(7(t)) with 7, s, p and v as in definition |T^. Before explicitly 
computing the derivatives in (15), observe that since s{'y{t)) £ -^7(4) foi' each 
t S (— e, e), we have 



dt 



s{^{t)h{t)dfi 



L 



dt 

ds{l{t)) 



j{t)dn 
j{t)dfi 



In particular, for t = 0, we get 

(dys) {p)pdfi 



s{p)l{0)pdfi 



(18) 



We can now look more closely at (|T^). It reads 
{V:s)ip) = p-^/^' Kv.Vpi/..7(t)'/^X7(t)) 



p 



-1/r 



-1/r 



-1/r 



t=0 



1 1/r dlogijit)) 



dt 



s{p) + p 



1/r ds{-f{t)) 



t=0 



dt 



Kp^/r(-p'^'^i'{0)s{p)+p'/^ {d,s){p) 



-e'{o)s{p) + id,s){p)] pdp.. 

n\r J 
At this point we make use of ( [T^ ) in the integrand above to obtain 



-f(0)s(p) + (4s)(p) 
r 



(V^)(p) = V(0)s(p) + (rf.5)(l5) + f- - 1 
r V r 



1 + a 
2 

1 + a 



\{d,s){p) - E^{{d,s){p))\-r 



{d^s){p)pdfi 
1 — a 



[id,s){p)+s{p)i'm 



v(^)^)(p) + 



1 — a 



vl-^).) (p) 
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A direct application of @ gives the following. 

Corollary 19 The connections V" and V~" are dual with respect to the Fisher 
information (•,-)p. 
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